11 research outputs found

    Towards trustworthy phoneme boundary detection with autoregressive model and improved evaluation metric

    Full text link
    Phoneme boundary detection has been studied due to its central role in various speech applications. In this work, we point out that this task needs to be addressed not only by algorithmic way, but also by evaluation metric. To this end, we first propose a state-of-the-art phoneme boundary detector that operates in an autoregressive manner, dubbed SuperSeg. Experiments on the TIMIT and Buckeye corpora demonstrates that SuperSeg identifies phoneme boundaries with significant margin compared to existing models. Furthermore, we note that there is a limitation on the popular evaluation metric, R-value, and propose new evaluation metrics that prevent each boundary from contributing to evaluation multiple times. The proposed metrics reveal the weaknesses of non-autoregressive baselines and establishes a reliable criterion that suits for evaluating phoneme boundary detection.Comment: 5 pages, submitted to ICASSP 202

    Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition

    Full text link
    Human Activity Recognition (HAR) systems have been extensively studied by the vision and ubiquitous computing communities due to their practical applications in daily life, such as smart homes, surveillance, and health monitoring. Typically, this process is supervised in nature and the development of such systems requires access to large quantities of annotated data. However, the higher costs and challenges associated with obtaining good quality annotations have rendered the application of self-supervised methods an attractive option and contrastive learning comprises one such method. However, a major component of successful contrastive learning is the selection of good positive and negative samples. Although positive samples are directly obtainable, sampling good negative samples remain a challenge. As human activities can be recorded by several modalities like camera and IMU sensors, we propose a hard negative sampling method for multimodal HAR with a hard negative sampling loss for skeleton and IMU data pairs. We exploit hard negatives that have different labels from the anchor but are projected nearby in the latent space using an adjustable concentration parameter. Through extensive experiments on two benchmark datasets: UTD-MHAD and MMAct, we demonstrate the robustness of our approach forlearning strong feature representation for HAR tasks, and on the limited data setting. We further show that our model outperforms all other state-of-the-art methods for UTD-MHAD dataset, and self-supervised methods for MMAct: Cross session, even when uni-modal data are used during downstream activity recognition

    Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition

    Full text link
    To properly assist humans in their needs, human activity recognition (HAR) systems need the ability to fuse information from multiple modalities. Our hypothesis is that multimodal sensors, visual and non-visual tend to provide complementary information, addressing the limitations of other modalities. In this work, we propose a multi-modal framework that learns to effectively combine features from RGB Video and IMU sensors, and show its robustness for MMAct and UTD-MHAD datasets. Our model is trained in two-stage, where in the first stage, each input encoder learns to effectively extract features, and in the second stage, learns to combine these individual features. We show significant improvements of 22% and 11% compared to video only and IMU only setup on UTD-MHAD dataset, and 20% and 12% on MMAct datasets. Through extensive experimentation, we show the robustness of our model on zero shot setting, and limited annotated data setting. We further compare with state-of-the-art methods that use more input modalities and show that our method outperforms significantly on the more difficult MMact dataset, and performs comparably in UTD-MHAD dataset

    E???Band Metasurface???Based Orbital Angular Momentum Multiplexing and Demultiplexing

    No full text
    Orbital angular momentum (OAM) has received considerable attention regarding high-capacity communication owing to its spatial orthogonality. However, it is still challenging to build a compact communication system that can generate multiple coaxial OAM beams and receive information from each. In this work, OAM multiplexing and demultiplexing at the E-band frequency using a single metasurface structure is proposed and experimentally demonstrated. For OAM multiplexing, the metasurface used as a transceiver generates two orthogonal coaxial OAM beams for Gaussian incident beams with different incidence angles. For OAM demultiplexing, the same metasurface flipped 180?? as a receiver forms a Gaussian beam in different off-axis directions depending on the topological charge of the coaxially incident OAM beam. The Gaussian beam measured at the receiver end exhibits a high signal-to-noise ratio of more than 33 dB compared with the background OAM beam. OAM multiplexing and demultiplexing based on a single metasurface may provide a route for high-capacity and compact free-space communication systems

    E-band metasurface based Orbital Angular Momentum Multiplexing and Demultiplexing

    No full text
    Orbital angular momentum (OAM) has received considerable attention regarding high-capacity communication owing to its spatial orthogonality. However, it is still challenging to build a compact communication system that can generate multiple coaxial OAM beams and receive information from each. In this work, OAM multiplexing and demultiplexing at the E-band frequency using a single metasurface structure is proposed and experimentally demonstrated. For OAM multiplexing, the metasurface used as a transceiver generates two orthogonal coaxial OAM beams for Gaussian incident beams with different incidence angles. For OAM demultiplexing, the same metasurface flipped 180?? as a receiver forms a Gaussian beam in different off-axis directions depending on the topological charge of the coaxially incident OAM beam. The Gaussian beam measured at the receiver end exhibits a high signal-to-noise ratio of more than 33 dB compared with the background OAM beam. OAM multiplexing and demultiplexing based on a single metasurface may provide a route for high-capacity and compact free-space communication systems

    Climate Change and an Agronomic Journey from the Past to the Present for the Future: A Past Reference Investigation and Current Experiment (PRICE) Study

    No full text
    According to numerous chamber and free-air CO2 enrichment (FACE) studies with artificially raised CO2 concentration and/or temperature, it appears that increasing atmospheric CO2 concentrations ([CO2]) stimulates crop yield. However, there is still controversy about the extent of the yield stimulation by elevating [CO2] and concern regarding the potential adverse effects when temperature rises concomitantly. Here, we tested the effects of natural elevated [CO2] (ca. 120 ppm above the ambient level in 100 years ago) and warming (ca. 1.7–3.2 °C above the ambient level 100 years ago) on rice growth and yield over three crop seasons via a past reference investigation and current experiment (PRICE) study. In 2020–2022, the rice cultivar Tamanishiki (Oryza sativa, ssp. japonica) was grown in Wagner’s pots (1/2000 a) at the experiment fields of Chonnam National University (35°10′ N, 126°53′ E), Gwangju, Korea, according to the pot trial methodology of the reference experiment conducted in 1920–1922. Elevated [CO2] and temperature over the last 100 years significantly stimulated plant height (13.4% on average), tiller number (11.5%), and shoot biomass (10.8%). In addition, elevated [CO2] and warming resulted in a marked acceleration of flowering phenology (6.8% or 5.1 days), potentially leading to adverse effects on tiller number and grain yield. While the harvest index exhibited a dramatic reduction (12.2%), grain yield remained unchanged with elevated [CO2] and warming over the last century. The response of these crop parameters to elevated [CO2] and warming was highly sensitive to sunshine duration during the period from transplanting to heading. Despite the pot-based observations, considering a piecewise response pattern of C3 crop productivity to [CO2] of 2] (+120 ppm) and moderate warming (+1.7–3.2 °C) in the absence of adaptation measures (e.g., cultivars and agronomic management practices). Hence, our results suggest that the PRICE platform may provide a promising way to better understand and forecast the net impact of climate change on major crops that have historical and experimental archived data, like rice, wheat, and soybean
    corecore